摘要 :
Computer architecture simulation tools are essential for implementing and evaluating new ideas in the domain and can be useful for understanding the behavior of programs and finding microarchitectural bottlenecks. One particularly...
展开
Computer architecture simulation tools are essential for implementing and evaluating new ideas in the domain and can be useful for understanding the behavior of programs and finding microarchitectural bottlenecks. One particularly important part of almost any processor is the cache hierarchy. While some simulators support simulating a whole processor, including the cache hierarchy, cores, and on-chip interconnect, others may only support simulating the cache hierarchy. This survey provides a detailed discussion on 28 CPU cache simulators, including popular or recent simulators. We compare between all of these simulators in four different ways: major design characteristics, support for specific cache design features, support for specific cache-related metrics, and validation methods and efforts. The strengths and shortcomings of each simulator and major issues that are common to all simulators are highlighted. The information presented in this survey was collected from many different sources, including research papers, documentations, source code bases, and others. This survey is potentially useful for both users and developers of cache simulators. To the best of our knowledge, this is the first comprehensive survey on cache simulation tools.
收起
摘要 :
Recently, multi-core processors are used in embedded systems very often. Since application programs is much limited running on embedded systems, there must exists an optimal cache memory configuration in terms of power and area. S...
展开
Recently, multi-core processors are used in embedded systems very often. Since application programs is much limited running on embedded systems, there must exists an optimal cache memory configuration in terms of power and area. Simulating application programs on various cache configurations is one of the best options to determine the optimal one. Multi-core cache configuration simulation, however, is much more complicated and takes much more time than single-core cache configuration simulation. In this paper, we propose a very fast dual-core L1 cache configuration simulation algorithm. We first propose a new data structure where just a single data structure represents two or more multi-core cache configurations with different cache associativities. After that, we propose a new multi-core cache configuration simulation algorithm using our new data structure associated with new theorems. Experimental results demonstrate that our algorithm obtains exact simulation results but runs 20 times faster than a conventional approach.
收起
摘要 :
The Web has become the most important source of information and communication for the world. Proxy servers are used to cache objects with the goals of decreasing network traffic, reducing user perceived lag and loads on origin ser...
展开
The Web has become the most important source of information and communication for the world. Proxy servers are used to cache objects with the goals of decreasing network traffic, reducing user perceived lag and loads on origin servers. In this paper, we focus on the cache replacement problem with respect to proxy servers. Despite the fact that some Web 2.0 applications have dynamic objects, most of the Web traffic has static content with file types such as cascading style sheets, javascript files, images, etc. The cache replacement strategies implemented in Squid, a widely used proxy cache software, are no longer considered 'good enough' today. Squid's default strategy is Least Recently Used. While this is a simple approach, it does not necessarily achieve the targeted goals. We simulate 27 proxy cache replacement strategies and analyze them against several important performance measures. Hit rate and byte hit rate are the most commonly used performance metrics in the literature. Hit rate is an indication of user perceived lag, while byte hit rate is an indication of the amount of network traffic. We also introduce a new performance metric, the object removal rate, which is an indication of CPU usage and disk access at the proxy server. This metric is particularly important for busy cache servers or servers with lower processing power. Our study provides valuable insights for both industry and academia. They are especially important for Web proxy cache system administrators; particularly in wireless ad-hoc networks as the caches on mobile devices are relatively small.
收起
摘要 :
In an embedded system where a single application or a class of applications is repeatedly executed on a processor, its cache configuration can be customized such that an optimal one is achieved. We can have an optimal cache config...
展开
In an embedded system where a single application or a class of applications is repeatedly executed on a processor, its cache configuration can be customized such that an optimal one is achieved. We can have an optimal cache configuration which minimizes overall memory access time by varying the three cache parameters: the number of sets, a line size, and an associativity. In this paper, we first propose two cache simulation algorithms: CRCB1 and CRCB2, based on Cache Inclusion Property. They realize exact cache simulation but decrease the number of cache hit/miss judgments dramatically. We further propose three more cache design space exploration algorithms: CRMF1, CRMF2, and CRMF3, based on our experimental observations. They can find an almost optimal cache configuration from the viewpoint of access time. By using our approach, the number of cache hit/miss judgments required for optimizing cache configurations is reduced to 1/10-1/50 compared to conventional approaches. As a result, our proposed approach totally runs an average of 3.2 times faster and a maximum of 5.3 times faster compared to the fastest approach proposed so far. Our proposed cache simulation approach achieves the world fastest cache design space exploration when optimizing total memory access time.
收起
摘要 :
Most applications in urban vehicular ad hoc networks (VANETs) rely on information sharing, such as real-time traffic information queries, and advertisements. However, existing data dissemination techniques cannot guarantee satisfa...
展开
Most applications in urban vehicular ad hoc networks (VANETs) rely on information sharing, such as real-time traffic information queries, and advertisements. However, existing data dissemination techniques cannot guarantee satisfactory performance when amounts of information requests come from all around the network. Because these pieces of information are useful for multiple users located in various positions, it is beneficial to spread the cached copies around. Existing work proposed caching mechanisms and conducted simulations for validation, but there is a lack of theoretical analysis on the explicit caching effects. Because of the complex urban environment and high mobility of vehicles, quantifying the caching effects on the VANET performance is quite challenging. We present the cache coverage ratio as the metric to measure the caching effects, and theoretical analysis is given based on reasonable assumptions for urban VANETs, through which we find the affecting factors include vehicle density, transmission range, and ratio of caching vehicles. We deduce the quantitative relationship among them, which have similar forms as the cumulative density function of an exponential distribution. We also consider the impact of vehicle mobility to predict the future cache effect on surrounding roads of the caching area. We conduct intensive simulations, which verify that the theoretical analysis results match quite well with the simulated reality under different scenarios. Copyright (c) 2015 John Wiley & Sons, Ltd.
收起
摘要 :
Cache has long been used to minimize the latency of main memory accesses by storing frequently used data near the processor. Processor performance depends on the underlying cache performance. Therefore, significant research has be...
展开
Cache has long been used to minimize the latency of main memory accesses by storing frequently used data near the processor. Processor performance depends on the underlying cache performance. Therefore, significant research has been done to identify the most crucial metrics of cache performance. Although the majority of research focuses on measuring cache hit rates and data movement as the primary cache performance metrics, cache utilization is significantly important. We investigate the application's locality using cache utilization metrics. Furthermore, we present cache utilization and traditional cache performance metrics as the program progresses providing detailed insights into the dynamic application behavior on parallel applications from four benchmark suites running on multiple cores. We explore cache utilization for APEX, Mantevo, NAS, and PARSEC, mostly scientific benchmark suites. Our results indicate that 40% of the data bytes in a cache line are accessed at least once before line eviction. Also, on average a byte is accessed two times before the cache line is evicted for these applications. Moreover, we present runtime cache utilization, as well as, conventional performance metrics that illustrate a holistic understanding of cache behavior. To facilitate this research, we build a memory simulator incorporated into the Structural Simulation Toolkit (Rodrigues et al. in SIGMETRICS Perform Eval Rev 38(4):37-42, 2011). Our results suggest that variable cache line size can result in better performance and can also conserve power.
收起
摘要 :
Adopting a proper cache document replacement policy is critical to the performance of a caching system. Among the existing cache document replacement policies, no one policy can surpass all the other policies in every case. Beside...
展开
Adopting a proper cache document replacement policy is critical to the performance of a caching system. Among the existing cache document replacement policies, no one policy can surpass all the other policies in every case. Besides, the most suitable cache document replacement policy for a caching system is often chosen from the existing policies, which cannot guarantee the optimality of the chosen policy. These phenomena motivate us to construct a cache document replacement policy which content can be tailored to the specific requirements of a caching system. In this study, the optimal linear combination (OLC) cache document replacement policy tailored to the requirements of the caching system is to be found out. To evaluate the effectiveness of the proposed methodology, an experimental EC website has been constructed, and the log file of the website server was used as the data source to evaluate the performances of various cache document replacement policies under different cache sizes. In our simulation experiments, the OLC policies outperformed the other traditional policies by increasing the hit rate and the byte hit rate up to 7% and 11%, respectively. (c) 2006 Elsevier B.V. All rights reserved.
收起
摘要 :
This article proposes the TLB-unified cache, which is one of the indirect tagged cache implementation methods. In the indirect tagged cache, the cache tag functions as a Pointer to another address. It requires less hardware than t...
展开
This article proposes the TLB-unified cache, which is one of the indirect tagged cache implementation methods. In the indirect tagged cache, the cache tag functions as a Pointer to another address. It requires less hardware than the conventional method. In order to maintain the consis- tency between the indirect tag and the cache in the indirect tagged cache, however, there must be a high-speed selective cache invalidating mechanism. From such a viewpoint, this article proposes time stamp invalidation as one of the in- validating mechanisms. We present an implementation of time stamp invalidation for the TLB-unified cache, where the TLB and the cache share a tag. As the next step, the amount of hardware resources that can be saved by using the indirect tag is evaluated. It is then shown that the saved hardware resources can be transferred to other on-chip units in order to improve their Performance. Lastly, the perform- ance of the TLB-unified cache is evaluated by trace-driven simulation, and it is shown that the performance can be improved with less hardware complexity than the conven- tional method.
收起
摘要 :
Web proxy caches are used to improve the performance of the World Wide Web (WWW). Many advantages can be gathered from caching such as improving the hit rates, reducing network traffic, and alleviating loads on origin servers. On ...
展开
Web proxy caches are used to improve the performance of the World Wide Web (WWW). Many advantages can be gathered from caching such as improving the hit rates, reducing network traffic, and alleviating loads on origin servers. On the other hand, retrieving the same object many times consumes the network bandwidth. Thus, in order to overcome this limitation, in this work, a cooperative web caching approach for media objects based on peer-to-peer systems is proposed. Tow performance metrics are used that are Hit Ratio (HR) and Byte Hit Ratio (BHR). A simulation is carried out to study the affects of cooperative caching on the performance of web proxy caching policies. The results show that cooperative caching improves the performance of web proxy caching policies in delivering media objects.
收起
摘要 :
Web caching plays a key role in improving the performance of the web by reducing the latency of delivering web objects to the users. On the other hand, a simulation tool plays a key role in studying the behavior of any network suc...
展开
Web caching plays a key role in improving the performance of the web by reducing the latency of delivering web objects to the users. On the other hand, a simulation tool plays a key role in studying the behavior of any network such as studying the effects of web caching on the performance of the network. The aim of this work is to present a tool for simulating a web proxy caching for windows operating systems since there is no existing well-compatible simulation tool for windows operating systems that can simulate Hit Ratio (HR) and Byte Hit Ratio (BHR) for traditional caching policies. The proposed simulation tool is called Windows Web Proxy Caching Simulation (WWPCS). The results show the performance of traditional web caching policies for different cache's sizes. Moreover, in order to show the efficiency of WWPCS, the results of running WWPCS are compared to the results of running a Unix-based tool.
收起